Project: TMDB Movie Dataset

Table of Contents

Introduction

Brief Description: This data set contains information about 10,000 movies collected from The Movie Database (TMDb), including user ratings, budgets, revenues and directors, etc..

Questions to answer:

  • 1- What properties are affecting the revenue of the movies that have high revenues?
  • 2- What is the relationship between the budget and the revenue of a movie?
  • 3- What is the production companies of the highest 5 movies in revenues?
  • 4- What is the most profitable month over all the years?
  • 5- What has been the most active movie publishing month over the years?

Data Wrangling

Only 1 file will be imported into the Jupyter NoteBook

Assessment

Assessment Notes:

Quality Assessment

Tidiness

Data Cleaning

Copying the Dataset

Define

Code

Test

Define

Converting:

Test

Define

Code

Test

Define

Code

Test

Define

Code

Test

Define

Code

Test

Define

Code

Test

Stroing the data

Exploratory Data Analysis

Research Question 1

What properties are affecting the revenue of the movies that have high revenues?

  • Ans: the most affecting properties is the number of votes, the budget, and the popularity of the film

Exploring the correlation between Revenue and other properties to figure out what is properties that highly affects the revenue of a movie.

  • This Pie Chart shows that the properties that have highest correlation with revenue is the Budget, Vote_Count, Popularity

Exploring the correlation between Revenue and other properties to figure out what is properties that highly affects the revenue of a movie.

  • This Bar Chart shows in details the correlation value of every property that affect the revenue.

Research Question 2

  • What is the relationship between the budget and the revenue of a movie?
  • There is a positive correlation between them.This means that when you increase the budget, so does the revenue for that movie

Exploring the relationship between the Budget of a movie and it's Revenue.

  • This Scatter Plot Shows that Budget and Revenue have positive correlation, which means that the more you put in the budget of the film, the more revenue you will get.

Research Question 3

  • What is the production companies of the highest 5 movies in revenues?

Research Question 4

  • What is the most profitable month over all the years?
  • June.

Distribution of the months' revenues over the years. So, we can figure which is the best month of the year to gain profits.

  • This Bar Chart Shows in numbers the total revenues for each month over all the years.

Research Question 5

  • What has been the most active movie publishing month over the years?
  • September

Distribution of the months' movies publishing rate over the years.

  • This Bar Chart Shows in numbers the total number of movies for each month over all the years.

Conclusions

In this dataset,I did the analysis phase to clarify answers to the above question, which were:

  • 1- What properties are affecting the revenue of the movies that have high revenues?
  • 2- What is the relationship between the budget and the revenue of a movie?
  • 3- What is the production companies of the highest 5 movies in revenues?
  • 4- What is the most profitable month over all the years?
  • 5- What has been the most active movie publishing month over the years?

So, for the first question. I started by extracting the movies that have a revenue is greater than the 75% revenue value. then finding the correlation between the revenue and other features in this subset and I have found that the most affecting properties is the number of votes, the budget, and the popularity of the film.

For the second question. I have found that the budget have a high-positive correlation with the revenue, which means that budget is affecting the revenue greatly. Which means that the more you put in the budget of the film, the more revenue you will get.

For the third question. I started by finding the highest 5 values for the revenue, then retriving it's production companies. Twentieth Century Fox Film Corporation', 'Lightstorm Entertainment' those 2 production companies produced two of the highest grossing films.

For the fourth question. I started by grouping the dataset by the month, then calculating the total revenue for each month. June and Decemeber Have the highest revenues over all years. So, they could be the most right months to publish a movie than can be profitable

For the fifth question. The most active month over all years is September

From the question 4 and 5, after doing this analysis and visualizations. we could not find a clear relationship between months that have high movie puplishing rate and months that have high revenues. In other words, I was expecting that when there is a big number of movies in theaters, it will afects the profit of each individual movie. Because due to the large number of films and the variety of options, one film will not get as many attendees and views as if there were a small number of films. but my expectations was wrong or Not strongly true.

Limitations

I have faced some major limitation in this dataset that I had to deal with, but anyway it affected the accuracy of my results.

  • Zero values for Revenue, Budget columns. I have found that more than half of the records have zero value! As those 2 columns are one of my major analysis parts, I couldn't drop them or even leave them as they are. So, I had to fill them with the mean value of movies budget, revenues in the same frame of time. Trying to solve the problem with the best in-hand solution.But it still inaccurate and this leads to inaccurate results.